On constant factor approximation for earth mover distance over doubling metrics

نویسنده

  • Shi Li
چکیده

Given a metric space (X, dX), the earth mover distance between two distributions over X is defined as the minimum cost of a bipartite matching between the two distributions. The doubling dimension of a metric (X, dX) is the smallest value α such that every ball in X can be covered by 2 ball of half the radius. A metric (or a sequence of metrics) is called doubling precisely if its doubling dimension is bounded. We study the efficient algorithms for approximating earth mover distance over doubling precisely metrics. Our first result is a near linear time (in the size of the X) algorithm for estimating EMD over doubling metric X , with a O(αX) approximation ratio, where αX is the doubling dimension of X . Given a metric (X, dX), we can use Õ(n ) preprocessing time to create a data structure of size Õ(n), such that subsequent EMD queries can be answered in Õ(n) time, with approximation ratio O(αX/ǫ). Our second result is an encoding scheme, which is a weaker form of sketching. In an encoding scheme, distributions are encoded, such that the EMD between two distributions can be estimated in sub linear time, given the encodings of the two distributions. In particular, given (X, dX), by using Õ(n ) preprocessing time, every subsequent distribution μ can be encoded into F (μ) in Õ(n) time. The query for EMD between μ and ν can be answered in Õ(n) time, with approximation ratio O(αX/ǫ), given the two encodings F (μ) and F (ν). The encoding scheme has immediate applications. In a 2-player game where 1 player knows μ and the other knows ν, there is a communication protocol with small communication complexity, through which the two players can approximate the EMD between μ and ν. Another application is distance oracle, where we are given a metric (X, dX) and s distributions μ1, μ2, · · · , μs overX , we can use Õ(n+sn) preprocessing time, creating a data structure of size Õ(n + sn), such the query for EMD between μi and μj can be answered in Õ(n ) time, with approximation ratio O(αX/ǫ).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sketching Earth-Mover Distance on Graph Metrics

We develop linear sketches for estimating the Earth-Mover distance between two point sets, i.e., the cost of the minimum weight matching between the points according to some metric. While Euclidean distance and Edit distance are natural measures for vectors and strings respectively, Earth-Mover distance is a well-studied measure that is natural in the context of visual or metric data. Our work ...

متن کامل

Improved Approximation Algorithms for Earth-Mover Distance in Data Streams

For two multisets S and T of points in [∆], such that |S| = |T | = n, the earth-mover distance (EMD) between S and T is the minimum cost of a perfect bipartite matching with edges between points in S and T , i.e., EMD(S, T ) = minπ:S→T ∑ a∈S ||a−π(a)||1, where π ranges over all one-to-one mappings. The sketching complexity of approximating earth-mover distance in the two-dimensional grid is men...

متن کامل

Impossibility of Sketching of the 3D Transportation Metric with Quadratic Cost

Transportation cost metrics, also known as the Wasserstein distances Wp, are a natural choice for defining distances between two pointsets, or distributions, and have been applied in numerous fields. From the computational perspective, there has been an intensive research effort for understanding the Wp metrics over R, with work on the W1 metric (a.k.a earth mover distance) being most successfu...

متن کامل

Rademacher-Sketch: A Dimensionality-Reducing Embedding for Sum-Product Norms, with an Application to Earth-Mover Distance

Consider a sum-product normed space, i.e. a space of the form Y = `1 ⊗ X , where X is another normed space. Each element in Y consists of a length-n vector of elements in X , and the norm of an element in Y is the sum of the norms of its coordinates. In this paper we show a constant-distortion embedding from the normed space `1 ⊗X into a lower-dimensional normed space ` ′ 1 ⊗ X , where n′ n is ...

متن کامل

Nearly-optimal bounds for sparse recovery in generic norms, with applications to k-median sketching

We initiate the study of trade-offs between sparsity and the number of measurements in sparse recovery schemes for generic norms. Specifically, for a norm ‖ ·‖, sparsity parameter k, approximation factor K > 0, and probability of failure P > 0, we ask: what is the minimal value of m so that there is a distribution over m × n matrices A with the property that for any x, given Ax, we can recover ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1002.4034  شماره 

صفحات  -

تاریخ انتشار 2010